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ABSTRACT 

Captioning is the process of providing a synchronized 
written script (captions) to accompany auditory information. This 
article describes programs available for captioning digital media on 
computers, and discusses the results of a study on color-coding and 
placement of captions. Seventy-two students in the Preparatory 
Studies Program (PSP) at Gallaudet University (Washington, D.C.) 
participated in the study (PSP enrolls deaf and hard-of-hear ing 
students and prepares them for college). A 15-minute segment from a 
Disney film was used in the study. Four versions of digital captions 
were prepared: (1) captions color-coded for speaker identification, 
centered at the bottom of the screen; (2) black and white captions, 
centered at the bottom of the screen; (3) color-coded captions with 
placement dependent on the location of the speaker; and (4) black and 
white captions with placement dependent on the speaker's location. 
Results indicate that comprehension is higher when captions are 
color-coded for speaker identification than when captions are black 
and white. There are no significant differences between centered 
captions and captions with variable placement dependent on location 
of the speaker. (AEF) 
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Abstract: Captioning is the process of providing a synchronized written script 
(cap nons) 10 accompany auditory information. This article describes programs 
available for captioning digital media on computers and reports the results of a study 

hL'^^r ^"^ """^ °^ ^"""^ 'hat comprehension is 

higher when captions are color-coded for speaker identification than when captions 
are black-and-white. There are no significant differences in comprehension between 
centered captions and captions with variable placement dependent on location of the 
spcsxcr. 
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developers can synchronize text files to accompany audio, as part of its Ultimedii Tools series. On both PC 
and Macintosh platforms, some video-editing programs support captioning via titling functions (e.g., ATFs 
MediaMerge and Adobe's Premiere; the titling function in Premiere is currently available only in the Macintosh 
version). 

Multimedia programming environments, such as Apple's HyperCard and Asymetrix's Toolbook, can also 
be used for captioning. The research described in this paper involved the use of captioning tools developed, 
using Asymetrix's ToolbooK by Douglas Short of the Institute for Academic Technology. These tools were 
originally developed for providing translations for foreign language operas on laserdiscs {SyncText) and for 
providing notation and/or translation for music and lyrics on CD-Audio discs (CD Time Liner) (Institute for 
Academic Technology, 1992; Short, 1992). 

The SyncText laserdisc captioning program was subsequently enhanced by Short and King (1994) to include 
tools for statistics calculation (e.g., words-per-minute) and other functions (e.g., a function to find words and 
phrases in the caption list) that are helpful during the captioning process. The enhanced tool, CAP-Media LD, 
also meets or exceeds the current specifications for television captioning. For example, the FCC Caption 
Decoder Standard of 1991 (as revised in 1992; Electronic Industries Association [EIA], 1992) mandates that 
seven background and foreground colors be available; CAP-Media LD provides twenty different colors. CAP- 
Media LD can align captions via all methods available for television (left alignment, eight tab-settings, and/or 
starting at any character in the 32-character line possible on television). It also provides automatic centering 
and right-alignment. Line length is not restricted in CAP-Media LD, and a full range of proportional and 
monospace fonts is available. The fonts include all characters in the required and optional font sets, as 
specified for television captioning (EIA, 1992). The maximum number of caption lines is set at four, identical 
to the limit in television captioning (in CAP-Media LD, this limit can be overridden manually, if desired). 

CAP-Media LD, which works in conjunction with a set of tools called Express Author developed at the 
Institute for Academic Technology, permits the video and caption areas to be displayed full screen or sized to 
any portion of the computer screen. The program includes options for users to customize the target applications 
they develop using CAP-Media LD. For example, users can create lists of captions and graphic charts of video 
segments, which can then be used to control display of the video in non-linear order. Interactive captioning; 
is also possible via hyperlinks between the captions and Express Author's glossar>' tools. Thus, if a glossary 
and hyperlinks to the captions have been created, end-users are able to click on words or phrases as captions 
are displayed. The video is stopped automatically and the user is linked immediately to supporting materials- 
which can include text, video, audio, animation, and/or pictures-for clarification or elaboration. The glossary 
tools also permit end-users to add their own entries to the glossary. 

Finally, CAP-Media LD is designed such that display of non-text objects (e.g., graphic symbols) and other 
Toolbook functions can be synchronized to audio-visual displays in the same way that text is handled. This 
design will allow, for example, synchronization between display of a laserdisc video clip and an audio file being 
used for descriptive video services (WGBH Educational Foundation, 1993) to provide access to non-visual 
information for blind and visually-impaired people. 

Captioning Format Research and Standards 

King and LaSasso (1993-1996) are using CAP-Media LD in a series of studies designed to assess the 
impact of four format features on the comprehension of content and affective response to captions. These 
features include: (a) methods of indicating speaker identification, (b) methods of indicating non^speech 
information such as background noise, music, and emotional tone, (c) placement of captions in relation to the 
video display area, and (d) timing, length, and duration of captions. These studies are part of a project, funded 
by the U.S. Office of Special Education and Rehabilitative Services (OSERS), which is designed to help set 
new standards for computer-based multimedia and digital television captioning. The results of these studies 
are also expected to be generalized to today's analog captioning system. 
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reason^ F^rfL n , ?'^? ''''' ''^''''^ 'his research for several 

ZZ L "^'='^"°'°8y P^™"^ 'he researchers to have precise control over the segments of video 

to be shown and permns non-linear access to the video (ensuring that all subjects see exactly the same ponion 
0 v.deo and reducmg costs associated with counter-balancing the order of video clips). Secon Tomp2 
based capt.onmg .s more flexible than Line21-analog captioning (e.g., more fonts and variations Tuch L 

cStid^r f '.^T'^'^'^"''?''^ ''^ P""**"^^- computer-based caption a es^ 

eat^ef suc^ as r capt.ons created in a television editing suite (when the captions involve 

teatures-such as different fonts-not possible with current Line21 -analog captions). 

Beldu? 1 gsT n ^,o»T T ?^ ' °^ ^"^^'^'^ °" 'he format features of captioning (e.g. 

hat been mn.i TH",'"'"', . ^"f ''"^ * '^^2-1993). Recent research Vffort 

Juiv 1 ?Sq7 f ' ' ^'"■'^""'^ °f 1990. which mandated that, effective 

July 1, 1993 every television (13 inches or larger) manufactured in or imported to the United States must have 
bu.lt-,n captioning decoder circuitty (DuBow. 1991) and by the work of the FCC and E ectro^ Td s "e 
Association (e.g.. EIA, 1992. Hutchins. 1993) to set new technical standards for television captrning 

King and LaSasso (1992-1993) surveyed deaf and hard-of-hearing consumers to determine their opinions 

caoToTin^f^r""'" P^^^^""^^ '"'^ 'heir reaction to pro ose ew 

capt oning features. Among the results were indications that caption consumers prefer (a) sans serif over serif 
fonts (b) captions centered at the bottom of the screen rather than captions placed left- eme^ i^h dependin^^^ 
on where the speaker was on the screen), and (c) moderate movement of captions to avoid cover! on scrl 
titles (e.g., names of people and sports scores). Consumers expressed strong preference for he existing Lne2^ 
bTc3"ds Wr H''"' ^"'"P"^' '° character-generled proponl^n^ foms ^d vi 

S ^ -grou— (et 

.he JluT.r ""."P""-""^"^ ^5"""°" ^bout captioning, caption consumers identified problems related to 
the amount of captioning, accuracy of captioning, obstruction of on-screen titles, and format features such as 

cSrrt": donl^^^^^^^^^^^^ ^.T"' ^"'"^ "^'^^ ^'^ different speXr as 

currently done in the Australian captioning system), some requested that the names of speakers be included with 
captions, and others wanted different emotional tones to be represented. 

Effects of Color-Coding and Placement on Comprehension 

ElA-6oTReDon')E;A'T992! wh"'?'?"'?^'" * publication) and the 

■hI , r ^^P°"f l-^.' '992). which indicated that caption providers were exploring the use of color for speaker 

for speaker identification). [Note: the primary method used in current Line21. analog captioning for television 

ottToThel,; ^''ff^'''"'^' '^'T''- ^P^^^"' -P-ns'are'placedIn thM 

caption fn 1 'I I ^^''"^ '^e same location as the last 

caption for that speaker or m a new location, if the speaker has moved.] 

Method 

Subjects. Seventy-two studems in the Preparatory Studies Program fP<;P^ at r.^iiPuHpt ii • 
participated in this study. PSP enrolls deaf and hL-of-Lring student lo L not qt^f^eady for coTe' e 
sTudf T '° ^P""""' mathematics skills, as well as othe co ege pl^ 

studtes. Most of these students (82.6%) are profoundly deaf (90+ decibel loss in the better ear) wUh 5 S 

(56 9';rare "T' nT '^^-^ ^ 5 d More 'an 

?m J:i ''"'^'"'^ ''''^ ^*'h or without glasses, are included n tr^nll Lp le 

(80 students participated in study, 8 were eliminated due to vision problems or addition' h^d aps) ' 
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Students were randomly assigned to one of four study conditions (described below). Analyses of variance 
indicate that there are no significant differences between the groups on (a) age, F(3,68) = .91, p = .441, 
M5e=6.75; (b) hearing loss (measured in decibels), F(3,65)=.1.98,p=.126, Af5e = 165.04; or (c) reading level 
(measured by scores on the Degrees of Reading Poy^er test, F(3,64) = .81, /7 = .494, Af5^=54.95. 

Materials. A 15-minute segment from the beginning of Disney's S\^ord in the Stone was used in this 
study. Digital captions for the video segment were generated, using a 20-polnt, proportional, sans serif font 
and a full-screen display (the caption area covered the bottom of the video). Four versions were prepared: (a) 
captions color-coded for speaker identification, centered at the bottom of the screen, (b) black-and-white 
captions, centered at the bottom of the screen, (c) color-coded captions with placement dependent on the 
location of the speaker, and (d) black-and-white captions with placement dependent on the location of the 
speaker. The last two conditions emulated the system used by the caption provider who had prepared Line21- 
analog captions for the movie (i.e., captions were placed in the same place as would occur in television 
captioning). 

The test materials for the study involved four separate Toolbook applications, using the digital captions 
(generated by CAP-Media LD) for the four conditions. The instructions, controls for playing each video clip, 
the test question pages, and the captions themselves were identical in all four applications-only the color and 
placement of the captions varied. 

The 15-minute segment was divided into 22 video clips, with stopping points located at phrases where the 
identity of the speaker might be unclear to someone who could not hear the voices of the characters. Twenty 
of the stopping points-the actual test items-were selected from a larger set of potential items in a pilot-test 
conducted with hearing children and deaf and hearing adults. The other two stopping points were for a sample 
test question to ensure subjects understood the task and a final video clip (without a test question) to complete 
the video segment. Questions were in the form, "Who said ..." followed by the caption on the screen at the 
end of the preceding video clip. Answers were in the form of still-frames of the characters from the video, 
identified as A, B, or C. 

Procedures. As previously noted, subjects were randomly assigned to one of the four conditions. Testing 
took place over three days in a given week, with each subject participating in a single test session of 
approximately 45 minutes on one day. Subjects watched the video on a 27" computer monitor in groups of 
2-10 and recorded their answers in paper-based test booklets. Following the test, all subjects were asked to 
respond to a set of Likert-type items concerning their opinions of the captions they saw, and subjects in color- 
coded conditions were asked to name the color used for each character (black-and-white versions of still-frames 
from the video were used as prompts). The five -point, Likert-type scale had choices of "I like it. 1 like it a 
lot. I have no opinion. I don't like it. I don't like it a lot," with "it" being the feature of interest. 

A factorial analysis of variance was conducted with test scores as the dependent variable and color and 
placement as independent variables. Chi-Square statistics, using an experiment-wise alpha level of .01 , were 
calculated to identify those test items that best discriminated among the groups. Descriptive statistics and Chi- 
Square statistics were also used for the Likert-type items on affective responses to the digital captioning. A 
Pearson Product-Moment Correlation was calculated between test scores and scores on the five-item test 
identifying the color for different characters (for only those subjects who saw color-coded captions). 

Results 

Table 1 contains descriptive statistics for test scores (maximum score of 20) for the four conditions. There 
arc significant differences between those who saw color-coded captions and those who saw black-and-white 
captions, F(l ,68)= 12.55, /7 = .001. M5,= 14.36. There arc no significant differences for (a) placement, 
r(l.68) = .34, /7 = .561, M5,= 14.36, or (b) the color-by-placcmcnt interaction, F(1.68)= 1.89, p^M^. 
MS, =21. 09. 
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Table 1 

Mean Test Scores, Standard Deviations, and Number of Subjects for Study Conditions 

^^"'^""^^ Line21 format COMB INED MEANS 

Color-Coded 16.29 (3.60) (n=17) 17.00 (3.22) (n=18) leTs 

Black-and-white 14.35 (3.98) (n=17) 12.60 (4.22) («=20) 13.48 

COMBINED MEANS 15.32 14.68 14 99 

There are significant group differences in the response patterns of subjects for only two of the test 
quesuons, .8, x;(6./.= 72)= 18.65. p = .0048; .16, x^(6,A.= 72)=24.13, ^ = .0^049. Both test items related 
video cenes where the speaker ,s off-screen and the caption is located near one of the on-screen characters 
In the test item, the on-screen character is a distractor (incorrect choice). On item ffS, 65% of those who saw 
black-and-white, Line21 format captions selected that incorrect choice (compared to 41% of those who saw 
^IhI;^ T T ^^Pti°ns and only 12% of those who saw color-coded captions). On item #16, 40% 
and 47% of the subjects who saw black-and-white captions (Line21 format and centered, respectively) selected 
the on-screen character (compared to 14% of those who saw color-coded captions). 

There are no differences among the four groups on their overall affective responses to digital captioning 
For this analysis, frequencies for "like and like a lot" are collapsed, as are those for "don't like and don't like 
^PP'°''"^''?y three-fourths of the subjects like the captioning (78%), font style (78%), font size 
70<. ; . mixed upper- and lower-case of the lettering (80%). Of those who saw color-coded captions, 
79% hke them. For placement of captions, 78% of those who saw centered captions like centering, and 71 % 
of those who saw Line21 -format captions like variable placement. The percentage of subjects ^ho had no 
opinion concerning each feature of the captions ranged from 12% to 21 %, and the percentage of subjects who 
did not like the feature ranged from 3% to 14%. 

,h. ^'^PP"-°'^™^'^'y •rds (67%) of the subjects who saw color-coded captions could name the colors of 
the captions for a 1 five of the mam characters. A moderate, positive correlation exists between test scores and 
the number of colors correctly identified, r=.352, /7 = .035, n = 36. 

Discussion 

Results of this study show that color-coding has a significant effect on comprehension when 
comprehension IS narrowly defined as understanding of speaker identification for potentially-confusing items 
These results, however, cannot be interpreted to mean that color-coding should replace the currenf nS 
format that uses placement for speaker identification. First, color-coding is unusable for that portion of the 
population that ,s color-blind (thus, redundancy in any coding system that uses color will be necessary) 
Second, color-coding is unwieldy when the number of main characters is large. Third complex color-coding 

codin? "7 h""' '^^ ^^P'*°" ^^"'^^ --^'y ^-^'^-^ the comp le^ ; 0 t 

coding system, based on viewer feedback; C. Grimmer, personal communication. December 15. 1993) 

Color-coding for speaker identification, in some combination with placement and inclusion of character 
names in the captions for off-screen speakers, should be studied further. When there are a limited nurZ of 
mam characters in the story color-coding, as shown in the present study, appears to increase comprehension 
The next study by King and LaSasso (1993-1996) will address how and how well children learn, over t me 
a color-coding system used for speaker identification. 

Summary 

dinitanelel^in;"'°T^? 'uT"' '!^" ' °" '=^P"°"*"g rn"lti"''=dia and 

digital television The results have application for both captioning environments (for caption providers) and 
screen design (for multimedia developers). Most important, however, is the recogni^on that whhl. 

hke CAP.Med.a LD arc essential, if this important audience is to have full and equal ac ess ,0 d gital med? 
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